Goto

Collaborating Authors

 review purpose only


Efficient reconstruction of multidimensional random field models with heterogeneous data using stochastic neural networks

Xia, Mingtao, Shen, Qijing

arXiv.org Artificial Intelligence

In this paper, we analyze the scalability of a recent Wasserstein-distance approach for training stochastic neural networks (SNNs) to reconstruct multidimensional random field models. We prove a generalization error bound for reconstructing multidimensional random field models on training stochastic neural networks with a limited number of training data. Our results indicate that when noise is heterogeneous across dimensions, the convergence rate of the generalization error may not depend explicitly on the model's dimensionality, partially alleviating the "curse of dimensionality" for learning multidimensional random field models from a finite number of data points. Additionally, we improve the previous Wasserstein-distance SNN training approach and showcase the robustness of the SNN. Through numerical experiments on different multidimensional uncertainty quantification tasks, we show that our Wasserstein-distance approach can successfully train stochastic neural networks to learn multidimensional uncertainty models.


Energy Approach from $\varepsilon$-Graph to Continuum Diffusion Model with Connectivity Functional

Yang, Yahong, Lee, Sun, Calder, Jeff, Hao, Wenrui

arXiv.org Machine Learning

We derive an energy-based continuum limit for $\varepsilon$-graphs endowed with a general connectivity functional. We prove that the discrete energy and its continuum counterpart differ by at most $O(\varepsilon)$; the prefactor involves only the $W^{1,1}$-norm of the connectivity density as $\varepsilon\to0$, so the error bound remains valid even when that density has strong local fluctuations. As an application, we introduce a neural-network procedure that reconstructs the connectivity density from edge-weight data and then embeds the resulting continuum model into a brain-dynamics framework. In this setting, the usual constant diffusion coefficient is replaced by the spatially varying coefficient produced by the learned density, yielding dynamics that differ significantly from those obtained with conventional constant-diffusion models.


Alternatives to the Laplacian for Scalable Spectral Clustering with Group Fairness Constraints

Ojeda-Ruiz, Iván, Ju-Lee, Young, Dickens, Malcolm, Cambisaca, Leonardo

arXiv.org Artificial Intelligence

Recent research has focused on mitigating algorithmic bias in clustering by incorporating fairness constraints into algorithmic design. Notions such as disparate impact, community cohesion, and cost per population have been implemented to enforce equitable outcomes. Among these, group fairness (balance) ensures that each protected group is proportionally represented within every cluster. However, incorporating balance as a metric of fairness into spectral clustering algorithms has led to computational times that can be improved. This study aims to enhance the efficiency of spectral clustering algorithms by reformulating the constrained optimization problem using a new formulation derived from the Lagrangian method and the Sherman-Morrison-Woodbury (SMW) identity, resulting in the Fair-SMW algorithm. Fair-SMW employs three alternatives to the Laplacian matrix with different spectral gaps to generate multiple variations of Fair-SMW, achieving clustering solutions with comparable balance to existing algorithms while offering improved runtime performance. We present the results of Fair-SMW, evaluated using the Stochastic Block Model (SBM) to measure both runtime efficiency and balance across real-world network datasets, including LastFM, FacebookNet, Deezer, and German. We achieve an improvement in computation time that is twice as fast as the state-of-the-art, and also flexible enough to achieve twice as much balance.


Bayesian Calibration and Model Assessment of Cell Migration Dynamics with Surrogate Model Integration

Schenk, Christina, Jiménez, Jacobo Ayensa, Romero, Ignacio

arXiv.org Artificial Intelligence

Computational models provide crucial insights into complex biological processes such as cancer evolution, but their mechanistic nature often makes them nonlinear and parameter-rich, complicating calibration. We systematically evaluate parameter probability distributions in cell migration models using Bayesian calibration across four complementary strategies: parametric and surrogate models, each with and without explicit model discrepancy. This approach enables joint analysis of parameter uncertainty, predictive performance, and interpretability. Applied to a real data experiment of glioblastoma progression in microfluidic devices, surrogate models achieve higher computational efficiency and predictive accuracy, whereas parametric models yield more reliable parameter estimates due to their mechanistic grounding. Incorporating model discrepancy exposes structural limitations, clarifying where model refinement is necessary. Together, these comparisons offer practical guidance for calibrating and improving computational models of complex biological systems.


Cryo-EM as a Stochastic Inverse Problem

Espinosa, Diego Sanchez, Thiede, Erik H, Yang, Yunan

arXiv.org Machine Learning

Cryo-electron microscopy (Cryo-EM) enables high-resolution imaging of biomolecules, but structural heterogeneity remains a major challenge in 3D reconstruction. Traditional methods assume a discrete set of conformations, limiting their ability to recover continuous structural variability. In this work, we formulate cryo-EM reconstruction as a stochastic inverse problem (SIP) over probability measures, where the observed images are modeled as the push-forward of an unknown distribution over molecular structures via a random forward operator. We pose the reconstruction problem as the minimization of a variational discrepancy between observed and simulated image distributions, using statistical distances such as the KL divergence and the Maximum Mean Discrepancy. The resulting optimization is performed over the space of probability measures via a Wasserstein gradient flow, which we numerically solve using particles to represent and evolve conformational ensembles. We validate our approach using synthetic examples, including a realistic protein model, which demonstrates its ability to recover continuous distributions over structural states. We analyze the connection between our formulation and Maximum A Posteriori (MAP) approaches, which can be interpreted as instances of the discretize-then-optimize (DTO) framework. We further provide a consistency analysis, establishing conditions under which DTO methods, such as MAP estimation, converge to the solution of the underlying infinite-dimensional continuous problem. Beyond cryo-EM, the framework provides a general methodology for solving SIPs involving random forward operators.


Is the Frequency Principle always valid?

Zhai, Qijia

arXiv.org Artificial Intelligence

We investigate the learning dynamics of shallow ReLU neural networks on the unit sphere \(S^2\subset\mathbb{R}^3\) in polar coordinates \((τ,ϕ)\), considering both fixed and trainable neuron directions \(\{w_i\}\). For fixed weights, spherical harmonic expansions reveal an intrinsic low-frequency preference with coefficients decaying as \(O(\ell^{5/2}/2^\ell)\), typically leading to the Frequency Principle (FP) of lower-frequency-first learning. However, this principle can be violated under specific initial conditions or error distributions. With trainable weights, an additional rotation term in the harmonic evolution equations preserves exponential decay with decay order \(O(\ell^{7/2}/2^\ell)\) factor, also leading to the FP of lower-frequency-first learning. But like fixed weights case, the principle can be violated under specific initial conditions or error distributions. Our numerical results demonstrate that trainable directions increase learning complexity and can either maintain a low-frequency advantage or enable faster high-frequency emergence. This analysis suggests the FP should be viewed as a tendency rather than a rule on curved domains like \(S^2\), providing insights into how direction updates and harmonic expansions shape frequency-dependent learning.


Unsupervised operator learning approach for dissipative equations via Onsager principle

Chang, Zhipeng, Wen, Zhenye, Zhao, Xiaofei

arXiv.org Artificial Intelligence

Existing operator learning methods rely on supervised training with high-fidelity simulation data, introducing significant computational cost. In this work, we propose the deep Onsager operator learning (DOOL) method, a novel unsupervised framework for solving dissipative equations. Rooted in the Onsager variational principle (OVP), DOOL trains a deep operator network by directly minimizing the OVP-defined Rayleighian functional, requiring no labeled data, and then proceeds in time explicitly through conservation/change laws for the solution. Another key innovation here lies in the spatiotemporal decoupling strategy: the operator's trunk network processes spatial coordinates exclusively, thereby enhancing training efficiency, while integrated external time stepping enables temporal extrapolation. Numerical experiments on typical dissipative equations validate the effectiveness of the DOOL method, and systematic comparisons with supervised DeepONet and MIONet demonstrate its enhanced performance. Extensions are made to cover the second-order wave models with dissipation that do not directly follow OVP.


Diagonally-Weighted Generalized Method of Moments Estimation for Gaussian Mixture Modeling

Zhang, Liu, Mickelin, Oscar, Xu, Sheng, Singer, Amit

arXiv.org Machine Learning

Among these methods, the generalized method of moments (GMM) improves the statistical efficiency of MM by weighting the moments appropriately. However, the computational complexity and storage complexity of MM and GMM grow exponentially with the dimension, making these methods impractical for high-dimensional data or when higher-order moments are required. Such computational bottlenecks are more severe in GMM since it additionally requires estimating a large weighting matrix. To overcome these bottlenecks, we propose the diagonally-weighted GMM (DGMM), which achieves a balance among statistical efficiency, computational complexity, and numerical stability. We apply DGMM to study the parameter estimation problem for weakly separated heteroscedastic low-rank Gaussian mixtures and design a computationally efficient and numerically stable algorithm that obtains the DGMM estimator without explicitly computing or storing the moment tensors. We implement the proposed algorithm and empirically validate the advantages of DGMM: in numerical studies, DGMM attains smaller estimation errors while requiring substantially shorter runtime than MM and GMM. The code and data will be available upon publication at https://github.com/liu-lzhang/dgmm. Key words.


Bayesian Active Learning of (small) Quantile Sets through Expected Estimator Modification

Abdelmalek-Lomenech, Romain Ait, Bect, Julien, Vazquez, Emmanuel

arXiv.org Machine Learning

Given a multivariate function taking deterministic and unc ertain inputs, we consider the problem of estimating a quantile set: a set of deterministic inputs f or which the probability that the output belongs to a specific region remains below a given threshold. To solve this problem in the context of expensive-to-evaluate black-box functions, we propose a Bayesian active learning strategy based on Gaussian process modeling. The strategy is driven by a nov el sampling criterion, which belongs to a broader principle that we refer to as Expected Estimator Modification (EEM). More specifically, the strategy relies on a novel sampling criterion combined w ith a sequential Monte Carlo framework that enables the construction of batch-sequential designs for the efficient estimation of small quantile sets. The performance of the strategy is illustrated on seve ral synthetic examples and an industrial application case involving the ROTOR37 compressor model.


Kernel Based Maximum Entropy Inverse Reinforcement Learning for Mean-Field Games

Anahtarci, Berkay, Kariksiz, Can Deha, Saldi, Naci

arXiv.org Artificial Intelligence

We consider the maximum causal entropy inverse reinforcement learning problem for infinite-horizon stationary mean-field games, in which we model the unknown reward function within a reproducing kernel Hilbert space. This allows the inference of rich and potentially nonlinear reward structures directly from expert demonstrations, in contrast to most existing inverse reinforcement learning approaches for mean-field games that typically restrict the reward function to a linear combination of a fixed finite set of basis functions. We also focus on the infinite-horizon cost structure, whereas prior studies primarily rely on finite-horizon formulations. We introduce a Lagrangian relaxation to this maximum causal entropy inverse reinforcement learning problem that enables us to reformulate it as an unconstrained log-likelihood maximization problem, and obtain a solution \lk{via} a gradient ascent algorithm. To illustrate the theoretical consistency of the algorithm, we establish the smoothness of the log-likelihood objective by proving the Fréchet differentiability of the related soft Bellman operators with respect to the parameters in the reproducing kernel Hilbert space. We demonstrate the effectiveness of our method on a mean-field traffic routing game, where it accurately recovers expert behavior.